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Abstract 

In a convolution model, we observe random variables whose distribution is the 
convolution of some unknown density / and some known or partially known noise 
density g. In this paper, we focus on statistical procedures, which are adaptive with 
respect to the smoothness parameter r of unknown density /, and also (in some 
cases) to some unknown parameter of the noise density g. 

In a first part, we assume that g is known and polynomially smooth. We pro- 
vide goodness-of-fit procedures for the test Hq ■ f = fo, where the alternative Hi 
is expressed with respect to L2-norm (i.e. has the form ipn 2 \\f ~~ /oil! — Our 
adaptive (w.r.t r) procedure behaves differently according to whether / is polyno- 
mially or exponentially smooth. A payment for adaptation is noted in both cases 
and for computing this, we provide a non-uniform Berry-Esseen type theorem for 
degenerate [/-statistics. In the first case we prove that the payment for adaptation 
is optimal (thus unavoidable). 

In a second part, we study a wider framework: a semiparametric model, where g is 
exponentially smooth and stable, and its self-similarity index s is unknown. In order 
to ensure identifiability, we restrict our attention to polynomially smooth, Sobolev- 
type densities /. In this context, we provide a consistent estimation procedure for 
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s. This estimator is then plugged-into three different procedures: estimation of the 
unknown density /, of the functional J f 2 and test of the hypothesis Hq. These 
procedures are adaptive with respect to both s and r and attain the rates which 
are known optimal for known values of s and r. As a by-product, when the noise 
is known and exponentially smooth our testing procedure is adaptive for testing 
Sobolev-type densities. 

Resume 

Dans un modele de convolution, les observations sont des variables aleatoires reelles 
dont la distribution est la convoluee entre une densite inconnue / et une densite de 
bruit g supposee soit entierement connue, soit connue seulement a parametre pres. 
Nous etudions differentes procedures statistiques qui s'adaptent automatiquement 
au parametre de regularity r de la densite inconnue / ainsi que (dans certains cas), 
au parametre inconnu de la densite du bruit. 

Dans une premiere partie, nous supposons que g est connue et de regularity 
polynomials Nous proposons un test d'adequation de l'hypothese Hq : f = fo 
lorsque Palternative Hi est exprimee a partir de la norme L2 (i.e. de la forme 
^n^Wf ~ /0II2 — £)■ Cette procedure est adaptative (par rapport a r) et presente 
differentes vitesses de test (ip n ) en fonction du type de regularity de fo (polynomiale 
ou bien exponentielle) . L'adaptativite induit une perte sur la vitesse de test, perte 
qui est calculee grace a un theoreme de type Berry-Esseen non-uniforme pour des 
[/-statistiques degenerees. Dans le cas d'une regularity polynomiale pour /, nous 
prouvons que cette perte est inevitable et done optimale. 

Dans un second temps, nous nous placons dans le cadre plus large d'un modele 
semi-parametrique, oil g est la densite d'une loi stable (regularity de type expo- 
nentiel) avec un indice d'auto-similarite s inconnu. Pour assurer l'identifiabilite du 
modele, la densite / est supposee appartenir a un espace de Sobolev (regularity poly- 
nomiale). Dans ce cadre, nous proposons un estimateur consistant de s. Celui-ci est 
ensuite injecte dans trois procedures differentes : l'estimation de /, de la fonction- 
nelle J f 2 et le test de l'hypothese Hq. Ces procedures sont adaptatives par rapport 
a s et a r et atteignent les vitesses optimales du cas s et r connus. Enfin, lorsque g 
est connue et de regularity exponentielle, une consequence de notre resultat est que 
cette procedure de test est adaptative lorsque fo appartient a un espace de Sobolev. 

Key words: Adaptive nonparametric tests, convolution model, goodness-of-fit 
tests, infinitely differentiable functions, partially known noise, quadratic functional 
estimation, Sobolev classes, stable laws 
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1 Introduction 



Convolution model 

Consider the convolution model where the observed sample {Yj}i<j<n comes 
from the independent sum of independent and identically distributed (i.i.d.) 
random variables Xj with unknown density / and Fourier transform $ and 
i.i.d. noise variables Sj with known (maybe only up to a parameter) density g 
and Fourier transform $ 9 

Y j = X j + e j , l<j<n. (1) 

The density of the observations is denoted by p and its Fourier transform <3> p . 
Note that we have p = f * g where * denotes the convolution product and 
$p = 

The underlying unknown density / is always supposed to belong to Li D L-2. 
We shall consider probability density functions belonging to the class 

F(a,r,(3,L) = {/ : R -> R+, f f = 1, — J |$ (u)\ 2 \u\ 2f3 exp (2a\u\ r ) du < l| , 

(2) 

for L a positive constant, a > 0, < r < 2, /3 > and either r > or (3 > 0. 
Note that the case r = corresponds to Sobolev densities whereas r > 
corresponds to infinitely many diflerentiable (or supersmooth) densities. 

We consider noise distributions whose Fourier transform does not vanish on 
R: $ s (w) ^ 0, V a G 1. Typically, nonparametric estimation in convolution 
models gives rise to the distinction of two different behaviours for the noise 
distribution. We alternatively shall consider (for some constant c g > 0), 

polynomially smooth (or polynomial) noise 

|$ 9 («)| ~ c g \u\~ a , \u\ -> oo, a > 1; (3) 

exponentially smooth (or supersmooth or exponential) stable noise 

|$ 9 (u)\ = exp (-7 \u\ s ) , \u\ — > oo, 7, s > 0. (4) 

In this second case, the parameter s is called the self-similarity index of the 
noise density and we shall consider that it is unknown. 

Convolution models have been widely studied over the past two decades. We 
will be interested here both in estimation of the unknown density / and in 
testing the hypothesis H : / = / , with a particular interest in adaptive 
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procedures. Our first purpose is to provide goodness-of-fit testing procedures 
on /, for the test of the hypothesis H : f = f Q , which are adaptive with respect 
to the unknown smoothness parameter of /. The second one is to study the 
behaviour of different procedures (such as estimation of /, estimation of / f 2 
and goodness-of-fit test) in a setup where self-similarity index s is unknown. 



Adaptive procedures in the convolution model 



Concerning estimation, the asymptotically minimax setup in the context of 
pointwise or L p -norms and in the case of entirely known noise density g is 
the most studied one. Major results in this direction prove that the smoother 
the error density, the slower the optimal rates of convergence (see 0], jjj, 
3, concerning polynomial noise and 0, 0, B 

for exponential noise). 



Adaptive estimation procedures were considered first by [21] and then by [11 



They constructed wavelets estimators which do not depend on smoothness 
parameter of the density / to be estimated. Adaptive kernel estimators were 
given in B. A different adaptive approach is used in relying on penalized 
contrast estimators. 



Nonparametric goodness-of-fit testing has extensively been studied in the con- 
text of direct observations (namely a sample distributed from the density / 
to be tested ), b ut also for regression or in the Gaussian white noise model. 



We refer to [18j|, [16| for an overview on the subject. The convolution model 



provides an interesting setup where observations may come from a signal ob- 
served through some noise. 



Nonparametric goodness-of-fit tests in convolution models were studied in 15 
and in . The approach used in B is based on a minimax point of view com- 
bined with estimation of the quadratic functional f f 2 . Assuming the smooth- 
ness parameter of / to be known, the authors of [l5j define a version of the 
Bickel-Rosenblatt test statistic and study its asymptotic distribution under 
the null hypothesis and under fixed and local alternatives, while B provides a 
different goodness-of-fit testing procedure attaining the minimax rate of test- 
ing in each of the three following setups: Sobolev densities and polynomial 
noise, supersmooth densities and polynomial noise, Sobolev densities and ex- 
ponential noise. The case of supersmooth densities and exponential noise is 
also studied but the optimality of the procedure is not established in the case 
r > s. 



Our first goal here is to provide adaptive versions of these last procedures with 
respect to the parameters (a,r,/3). We restrict our attention to testing prob- 
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lems where alternatives are expressed with respect to L2-norm. Namely, the 
alternative has the form Hi : ipn 2 \\f ~ /oil I — ^- m suc h a case, the problem 
relates with asymptotically minimax estimation of / f 2 . 



Our second goal is to deal with the case of not entirely known noise distribu- 
tion. This is a crucial issue as, assuming this noise distribution to be entirely 
known is not realistic in many situations. However, in general, the noise den- 
sity g has to be known for the model to be identifiable. Nevertheless, when the 
noise density is exponentially smooth and the unknown density is restricted 
to be less smooth than the noise, semiparametric models are identifiable and 
they may be considered. The case of a Gaussian noise with unknown variance 
7 and unknown density without Gaussian component has first been considered 
in 19|. She proposes an estimator of the parameter 7 which is thenplugged in 
an estimator of the unknown density. This work is generalized in [4| for expo- 
nentially smooth noise with unknown scale parameter 7 and unknown densities 
belonging either to Sobolev classes, or to classes of supersmooth densities with 
parameter r, r < s. Minimax rates of convergence are exhibited. In this con- 
text, the unknown parameter 7 acts as a real nuisance parameter as the rates 
of convergence for estimating the unknown density are slower compared to 
the case of known scale, those rates being nonetheless optimal in a minimax 
sense. Another attempt to remove knowledge on the noise density appears in 
2p| where the author studies a deconvolution estimator associated to a proce- 
dure for selecting the error density between the Normal supersmooth density 
and the Laplace polynomially smooth density (both with fixed parameter val- 
ues). 



In the second part of our work, we will be interested in estimation procedures 
on /, adaptive both with respect to the smoothness parameter of / and to an 
unknown parameter of the noise density. More precisely, in the specific setup 
of Sobolev densities and exponential noise with symmetric stable distribution, 
we will consider the case of unknown self-similarity index s. In this context, 
we first propose an estimator of the self-similarity index s, which, plugged 
into kernel procedures, provides estimators of the unknown density / with the 
same optimal rate of convergence as in the case of entirely known noise density. 
Using the same techniques, we also construct an estimator of the quadratic 
functional J f 2 (with optimal rate of convergence) and L 2 goodness-of-fit test 
statistic. Note that this work is very different from as the self similarity 
index s plays a different role from the scale parameter 7 previously studied. 
In particular, the range of applications of those results is entirely new. 
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Notation, definitions, assumptions 



In the sequel, || • H2 denotes the L 2 -norm, M is the complex conjugate of M and 
< M, N >= J M(x)N(x)dx is the scalar product of complex-valued functions 
in L^R). Moreover, probability and expectation with respect to the distribu- 
tion of Yi , . . . , Y n induced by the unknown density / will be denoted by P/ 
and Ey. 

We denote more generally by r = (a,r,/3) the smoothness parameter of the 
unknown density / and by J-(t, L) the corresponding class. As the density / is 
unknown, the a priori knowledge of its smoothness parameter r could appear 
unrealistic. Thus, we assume that r belongs to a closed subset T, included in 
(0, +00) x (0, 2] x (0, +00). For a given density / in the class -F(to), we want 
to test the hypothesis 

H :f = f 

from observations Y±, . . . ,Y n given by ([!]). We extend the results of [i| by 
giving the family of sequences = {ip n T } reT which separates (with respect 
to L 2 -norm) the null hypothesis from a larger alternative 

i2i(C,* n ) : / e U rer {f e F(r,L) and ^||/ - f \\ 2 2 > C}. 

We recall that the usual procedure is to construct, for any < e < 1, a test 
statistic A* (an arbitrary function, with values in {0, 1}, which is measurable 
with respect to Yi, . . . , Y n and such that we accept H if A* = and reject it 
otherwise) for which there exists some C° > such that 

limsup|p [A* = 1] + sup ¥f[A*=0]\<e, (5) 
{ /eifi(c,*„) J 

holds for all C > C°. This part is called the upper bound of the testing rate. 
Then, prove the minimax optimality of this procedure, i.e. the lower bound 

liminf inf J P [A„ = 1] + sup P f [A n = 0] \ > e, (6) 
n ^°° A " { /e#i(e,# n ) ' J 

for some Co > and for all < C < Co, where the infimum is taken over all 
test statistics A n . 

Let us first remark that as we use noisy observations (and unlike what happens 
with direct observations), this test cannot be reduced to testing uniformity of 
the distribution density of the observed sample (i.e. fo = l with support on the 
finite interval [0; 1]). As a consequence, additional assumptions used in 0] on 
the tail behaviour of fo (ensuring it does not vanish arbitrarily fast) are needed 
to obtain the optimality result of the testing procedure in the case of Sobolev 
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density (r = 0) observed with polynomial noise ((T) and (P)), respectively 
with exponential noise ((T) and (E)). We recall these assumptions here for 
reader's convenience. 



Assumption (T) 

Co 



3c > 0,Vx e R,f (x) > 



1 + \x\ 



Moreover, we also need to control the derivatives of known Fourier transform 
$ 9 when establishing optimality results. 

Assumption (P) (Polynomial noise) If the noise satisfies then assume 
that $ 5 is three times continuously different iable and there exist A\, A 2 such 
that 

I WO) I < t-^t and \(&)"(u)\ < | U | - oo. 

Assumption (E) (Exponential noise) If the noise satisfies (HI), then assume 
that $ 9 is continuously differentiable and there exists some constants C > 
and A 3 G R such that 

\($ 9 )'(u)\ < C\u\ M exp(- 7 |M| s ), |u| -> oo. 

Remark 1 Similar results may be obtained when we assume the existence of 
some p > 1 swc/i t/iat /o(^) bounded from below by cq(1 + |x| p ) -2 for large 
enough x. In such a case, the Fourier transform $ s of the noise density is 
assumed to be p times continuously differentiable, with derivatives up to order 
p satisfying the same kind of bounds as in Assumption (P) ; when the noise is 
polynomial, respectively in Assumption (E) ; when the noise is exponential. 



Roadmap 



Section [2] deals with the case of (known) polynomial noise. We provide a 
goodness-of-fit testing procedure for the test H : f — f , in two different 
cases: the density f to be tested is either ordinary smooth (r = 0) or super- 
smooth (r > 0). The procedures are adaptive with respect to the smoothness 
parameter (oc,r,/3) of /. The proof of the upper bounds for the testing rate 
relies mainly on a Berry-Esseen inequality for degenerate [/-statistics of order 
2, postponed to Section HI In some cases, a loss for adaptation is noted with 
respect to known testing rates for fixed known parameters. When the loss is 
of order log log n to some power, we prove that this payment is unavoidable. 

In Section [31 we consider exponential noise of symmetric stable law with un- 
known self-similarity index s. In order to ensure identifiability, we restrict our 
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attention to Sobolev classes of densities /. The first step (Section 13.11) is to 
provide a consistent estimation procedure for the self-similarity index. Then 
(Section I3.2p using a plug-in, we introduce a new kernel estimator of / where 
both the bandwidth and the kernel are data dependent. We also introduce 
an estimator of the quadratic functional / f 2 with sample dependent band- 
width and kernel. We prove that these two procedures attain the same rates of 
convergence as in the case of entirely known noise distribution, and are thus 
asymptotically optimal in the minimax sense. We also present a goodness-of- 
fit test on / in this setup. We prove that the testing rate is the same as in the 
case of entirely known noise distribution and thus asymptotically optimal in 
the minimax sense. Proofs are postponed to Section [5j 



2 Polynomially smooth noise 

In this section, we shall assume that the noise density g is polynomial (j3J). 
The unknown density / belongs to the class J- (a, r, j3, L). We are interested in 
adaptive, with respect to the parameter r = (a,r, /3), goodness-of-fit testing 
procedures. We assume that this unknown parameter belongs to the following 

set 

T = {r = (a, r, (3)- r G [a; +oo) x [r; f] x [/3; /3]}, 

where a > 0, < r < f < 2, < (3 < /3 and either r > and a G [a, a] or 
both r = f = and (3 > 0. 

Let us introduce some notation. We consider a preliminary kernel J, with 
Fourier transform $ J , defined by 

\Jx G E, J(x) = Vu G E, $ j (m) = l| M i<i, 

where 1^ is the indicator function of the set A. For any bandwidth h = h n — > 
as n tends to infinity, we define the rescaled kernel Jh by 

Vx G E, J h (x) = h- l J(x/h) and Vu G E, $ j "(m) = $ j (/im) = l\ u \<i/h- 

Now, the deconvolution kernel Kh with bandwidth /i is defined via its Fourier 
transform $ Xh as 

= ($ 9 (n)) _1 $ J (u/i) = $ Jh (n), Vm G E. (7) 

In Section 13.21 we will consider a modification of this kernel to take into 
account the case of not entirely known noise density g. 

Next, the quadratic functional /(/ — fo) 2 is estimated by the statistic T n ^ 
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Tn^-T^—r, EE <K h (--Y k )-f Q ,K h (.-Y j )-f >. (8) 

"v* L ) l<k<j<n 

Note that T ni h may not be positive, but its expected value is. 

In order to construct a testing procedure which is adaptive with respect to the 
parameter r we introduce a sequence of finite regular grids over the set T of 
unknown parameters: T N = {r^ 1 < i < N}. For each grid point Tj we choose 
a testing threshold t 2 ni and a bandwidth h l n giving a test statistic T n ^ n . 

The test rejects the null hypothesis as soon as at least one of the single tests 
based on the parameter Tj is rejected. 

1 if sup^^jy iTn^Jt^ > C* 
otherwise, 

for some constant C* > and finite sequences of bandwidths {h l n }\<i<N and 
thresholds {^}i<i<iv- 

We note that our asymptotic results work for large enough constant C*. In 
practice we may choose it by Monte-Carlo simulation under the null hypoth- 
esis, for known f , such that we control the first-type error of the test and 
bound it from above, e.g. by e/2. 

Typically, the structure of the grid accounts for two different phenomena. A 
first part of the points is dedicated to the adaptation with respect to (3 in case 
f — r — 0, whereas the rest of the points is used to adapt the procedure with 
respect to r (whatever the value of (3). 

In the two next theorems, we fix a > 1. We note that the testing rates are 
essentially different according to the two different cases where fo belongs to 
a Sobolev class (r = 0, ao > a and we assume /3 = (3) and where f is a 
supersmooth function (a £ r o > and (3 G [f3, j3] and then we focus 

on tq = f and «o — oc). Note that in the first case, the alternative contains 
functions / which are smoother (r > 0) than the null hypothesis fo. 

When /o belongs to Sobolev class J-{a.Q, 0, (3, L), the grid is defined as follows. 
Let iV and choose T N = {t^, 1 < i < N + 1} such that 

VI < % < N, Ti = (0; 0; A) and ft = < (3 2 < . . . < (3 N = 0, 
< Vl<i<N-l, p i+1 -Pi = (p-0)/(N-l), 
and tat +1 = (a; f; 0) 



A* = <^ 

n 
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In this case, the first N points are dedicated to the adaptation with respect to 
(3 when f = r = 0, whereas the last point t^+i is used to adapt the procedure 
with respect to r (whatever the value of (3) . 

Theorem 1 Assume fo G J-{o,q, 0, /3, L). The test statistic A* given by (JH]) 
with parameters 



-2/(4^+4(7+1) 

hi 



N = flognl; VI < i < N : 



\y / loglogri 

-4^/(4^+4(7+1) > 



\/loglo 



ftJV+1 _ -2/(40+4(7+1). ,2 _ -40/(40+4(7+1) 

and any large enough positive constant C*, satisfies ([SD for any e G (0, 1), wi/i 
testing rate \l/ n = {^ niT } T gT given by 

/ _ \ -2/3/(4/3+4(7+1) 

*V = -s=A= l r= o+n- 2 ^^ +1 )l r>0 , Vr = (a, r, 0) G T. 



Vlog log 



n 



Moreover, if fo G T{ao, 0, /3, cL) /or some < c < 1 and if Assumptions (T) 
and (P) /io/d, t/ien this testing rate is adaptive minimax over the family of 
classes {F(t, L),t G [a, oo) x {0} x [f3_,j3]} (i.e. (jEJ) holds). 

We note that our testing procedure attains the polynomial rate n _2/3 ^ 4/3+4o " +1 ) 
over the union of all classes containing functions smoother than f Q . Note 
moreover that this rate is known to be a minimax testing rate over the class 
^-"(0, 0, P, L) by results in [3(. Therefore we prove that the loss of some power 
of log log n with respect to the minimax rate is unavoidable. A loss appears 
when the alternative contains classes of functions less smooth than fo. 

The proof that our adaptive procedure attains the minimax rate relies on the 
Berry-Esseen inequality presented in Section |U 



When f belongs to class T{a,r : /3q,L) of infinitely many differentiable func- 
tions, the grid is defined as follows. Let A^, N 2 and choose T N = {r^l < i < 
N = N 1 + N 2 } such that 

' VI < % < N u n = (0; 0; #) and ft =£<&<... < (3 Nl = 0, 

VI < % < Nx - 1, A +1 = 0)/ [Ni - 1), 
< — 

and VI < i < N 2 , r Nl+i = (a; r { ; f3 ) and r x = r < r 2 < . . . < r N2 = f, 
yi<i<N 2 -l, r l+l -n=(f- r)/{N 2 - 1). 
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In this case, the first N\ points are used for adaptation with respect to (3 in 
case f = r = 0, whereas the last N 2 points are used to adapt the procedure 
with respect to r (whatever the value of (3) . 

Theorem 2 Assume f G J-(a, f, /3 , L) for some /3 G [P,j3]. The test statis- 
tic A* given by (JS]) with C* large enough and 



h' 



N t = flognl; VI < i < A x : 



2/(4ft +4(7+1) 

^/log logn, 



f 2 



A^ 2 = [log log n/(f-r)]; VI < i < N 2 : 
satisfies (jSJ), mft testing rate ^ n = {ip nr } T€T given by 



s -4/3 t /(4ft+4(7+l) 7 

e +j = (^)" 1/r \c<«ex P (4) 



+i = n vlog log log n 



( n \ -^/(Wa+i) (logn) (4,7+1)/(4r) 

= /i j !r=oH 7= (log log logn) 1/4 l rg[r 

V Vlog log n/ V n 



We note that if Assumptions (T) and (P) hold for / in J- (a, f, flo, L), the 
same optimality proof as in Theorem [1] gives us that the loss of the log log n to 
some power factor is optimal over alternatives in Uae[a,a],/3e[/3,/3] ^{ a i 0> A L). 
A loss of a (log log log n) 1//4 factor appears over alternatives of supersmooth 
densities (less smooth than fo) with respect to the minimax rate in [3j. We do 
not prove that this loss is optimal. 



3 Exponentially smooth noise in a semiparametric context 



In this section, we assume the noise density g to be exponentially smooth 
and stable (jlj), for some unknown s G [s; s] and fixed (known) bounds 
< s < s < 2. More precisely, we suppose that the noise has symmetric 
stable law having Fourier transform 



Assumption (S) $ 9 (u) = exp(— |w| s ) where s G [s;s]. 



The results of Section 13.11 are valid under the more general assumption (Tj0) 
with known scale parameter 7, which enables us to select the smoothness 
parameter among the wider class of not necessarily symmetric stable densities 
with known scale parameter. Nevertheless, the exact form of Fourier transform 
$ 9 is needed for deconvolution purposes (see Section l3~2l) . 
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For the model to be identifiable, we must assume that / is not too smooth, 
i.e. its Fourier transform does not decay asymptotically faster than a known 
polynomial of order f3'. 

Assumption (A) There exists some known A > 0, such that > 
for large enough \u\. 

The notation q^ is used for the function u \— > A|-u| _/3 '. Under assumptions (S) 
and (A) the model is identifiable. Indeed, considering the Fourier transforms, 
we get for all real number u 

log|$ p (w)| = log|$(u)| - \u\ s . 

Now assume that we have the equality between two Fourier transform for the 
observations $? = where = $i(-u)e~l M l S1 and ^(u) = ^(w)^ 1 "' 82 . 

Without loss of generality, we may assume Si < s 2 . Then we get 

\u\~ Sl log |$i(m)| - 1 = \u\~ Sl log |$ 2 («)| - \u\ S2 ~ Sl 

and taking the limit when \u\ tends to infinity implies (with assumption (A)) 
that s\ = S2 and then <3>i = $ 2 which proves the identifiability of the model. 

In this context, P/ ;S and E/ )S respectively denote probability and expectation 
with respect to the model under parameters (/, s). 



3. 1 Estimation of the self- similarity index s 

We first present a selection procedure s n which asymptotically recovers the 
true value of the smoothness parameter s, with fast rate of convergence. We 
use a discrete grid {s±, . . . , Sjv}, with a number iV of points growing to infinity. 

The asymptotic behavior of the Fourier transform <3> p of the observations is 
used to select the smoothness index s. More precisely, we have for any large 
enough \u\ 

A\u\- p ' exp(-H s ) < |$ p (m)| < exp(-M s ), 

namely, the function |<3> p | asymptotically belongs to the pipe [qfs'(u)e~^ s ; e~'"' s ]. 
Let us now consider a discrete grid 0<s = Si<s 2 <...<sat = s<2 
and denote $W(-u) = e _ ' u ' Sfe . These families of functions {& k tyi<k<N an d 
{qf3>&^}i<k<N form an asymptotically decreasing family as there exists some 
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positive real number u\ such that for all real u with \u\ > u%, we have 

§ [1 \u) > q p ,(u)$W(u) > $ [2] (w) > • • > $ [N] (u) > q ,(u)$ [N] (u). (10) 

If the size of the grid is sufficiently small, the modulus of the Fourier transform 
$ p will asymptotically belong to one of the pipes [9^/$^; $^]. Our estimation 
procedure uses the empirical estimator 



K (u) = - £ exp (-iuYj ) , VmG 
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3=1 



of the Fourier transform $ p at some point u n which tends to infinity with 
n. The procedure selects the smoothness parameter among {sj, ...,Sjv} by 
choosing the pipe [qpi(u n )& k \u n )]& k \u n )] closest to the function |$^(-u„)|. 
More precisely 



u, 



s k if \ [q^m + (n n ) < \& n {u n )\ < \ { qp ,& k -n + $W} 

and 2 < k < N - 1, 

Sl if |$P( Un )l >|V $[1] + $[21 }^)' 
Sw if < \ {q^ N ^ + 

(11) 

where {u n } n >o is a sequence of positive real numbers growing to infinity and 
to be chosen later. See Figure [T] for an illustration of this procedure. 




Figure 1. Estimation procedure for s. When |$^(u n )| lies in the grey region, we 
choose s n = Sk- 
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This estimation procedure is well-defined for large enough n as for any 2 < 
k < N - 1, we have + $[ fc+1 ]}(w n ) < {g^ 1 ^ 11 + $ [k] }(u n ) 

This procedure is proved to be consistent, with an exponential rate of conver- 
gence, in the following proposition. 

Proposition 1 Under assumptions (S) and (A), consider the estimation pro- 
cedure given by (fTT|) where 

Aogn 2,3' + as . , \ 1/s ~ 
«»=(^ ^loglognj , 

for some fixed a > 1 and the equidistant grid s = Si < S2 < . . . < sjy = s is 
chosen as 

\s k+1 - Sfe| = d n = s(logn) _1 (loglogra) _1 ;iV-l= (s-s)/d n . 
Then, s n is strongly consistent, i.e. 

lim s n = s ; P* s — almost surely. 

Moreover, for each number of observations n, denote by s n (s) the unique point 
Sk on the grid such that Sk < s < s^+i . VKe /lave 

P/,»(sn »n(s)) < exp (~ (log n) a (1 + o(l))^ , 

where A is defined in Assumption (A) and a > 1 depends on the choice of u n . 
Remark 2 TTie result remains valid for any sequence d n satisfying 
d n u s n \ogu n < 1 and log(l/d n ) = o((logn) a ). 



3.2 Adaptive estimation and tests 



For the rest of this section, we shall assume that the unknown density / belongs 
to some Sobolev class ^-"(0, 0, j3, L) where j3 > is the smoothness parameter 
and L is a positive constant. We assume that the unknown parameter (3 belongs 
to some known interval \3, /3] . 

We now plug the preliminary estimator of s in the usual estimation and testing 
procedures. 

Let us introduce the kernel deconvolution estimator K n built on the prelimi- 
nary estimation of s and defined by its Fourier transform Q Kn , 
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u|<i> (12) 

where h n = ( — -r— — — log log . (13) 

y z s n J 

Note that both the bandwidth sequence h n and the kernel K n are random and 
depend on observations Y\, . . . , Y n . Now, the estimator of / is given by 

fn(x) = ^-±kj^). (14) 

nn n j = i \ h n ) 

This estimation procedure is consistent and adaptively achieves the minimax 
rate of convergence when considering unknown densities / in the union of 
Sobolev balls ^(O, 0, (3, L) with (3 G [(3, j3] C (1/2; +oo) and unknown smooth- 
ness parameter s G [s ; s] . 

Note that when a function belongs to JF(0, 0, /?, L) and assumption (A) is 
fulfilled, we necessarily have (3' > (3 + 1/2. 

Corollary 1 Under assumptions (S) and (A), for any (3 > (3 > 1/2, the 
estimation procedure given by (fT4")) which uses estimator s n defined by (fTTj) 
with parameter values: u n given by Proposition^ with a > s/s, 

d n = min j(logn) - ^ -1 / 2 ^-, s(logn loglogn) _1 | , 
satisfies, for any real number x, 

limsup sup sup sup (logn) ( - 2/3_1 ^ s Ej j <,|/ n (a;) — f(x)\ 2 < oo. 

n->oo se\£Z\ pe\J3,p\ f£F(p,0,j3,L) 

Moreover, this rate of convergence is asymptotically adaptive optimal. 

Remark 3 This result is obtained by using that, with high probability, the 
estimator s n is equal to the point Sk on the grid such that Sk < s < Sk+i 
(see Proposition U\) . Then, using the deconvolution kernel built on Sk is as 
good as using the true value s, as soon as the difference \sk — s\ is sufficiently 
small (which is ensured by the size of the grid). Note that the fact that we 
underestimate s by using Sk < s is rather important as deconvolution with 
overestimated s would lead to unbounded risk. 

Note that the optimality of this procedure is a direct consequence of a result 
by where he considers the convolution model for circular data with (3 and s 
fixed and known. Therefore we may say that there is no loss due to adaptation 
neither with respect to s or (3. 

Using the same kernel estimator ( 1121) and the same random bandwidth ( [TBI . 
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we define 

f n = -A- EE < M> f 1 ^) , M C-^ 1 ) > ■ (15) 

Corollary 2 Under assumptions (S) and (A), for any (3 > (3 > 0, the esti- 
mation procedure given by (|15p which uses estimator s n defined by (fTTj) with 
parameter values: u n given by Proposition^ with a > s/s, 

d n = min {(logn) -2 ^/-, s(lognloglogn) _1 | , 

satisfies, 

1/2 



limsup sup sup sup (logn) /s < Ej s 



2 

2/3/s 



T n ~ f 



f2 



< OO. 



Moreover, under additional Assumption (E), this rate of convergence is asymp- 
totically adaptive optimal. 

The rate of convergence of this procedure is the same as in the case of known 
self-similarity index s and known smoothness parameter f3. It is thus asymp- 
totically adaptive optimal according to results obtained by [ijj]. 



Let us now define, for any f G ^"(0, 0, j3, L), 



T, 



n ( n ~ 1 ) l<k<j<n K "\ K ) h n "\ h n 

This statistic is used for goodness-of-fit testing of the hypothesis 



H :f = f 

versus H x {C^ n ) : / G U m ^{f G ^(0,0, A L) and ^11/ - /oil" > C}. 
The test is constructed as usual 



1 if |T n °|t- 2 > V 



A* = < ' - (17) 

otherwise, 



for some constant C* > and a random threshold i 2 n to be specified. 

Corollary 3 Under assumptions (S) and (A), for any < (3 < f3, any L > 
and for any f G JF(0, 0, L) ; consider the testing procedure given by ifPTj) 
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which uses the test statistic (|T6|) with estimator s n defined by (TTTT) with pa- 
rameter values: u n given by Proposition^ with a > 1, 

d n = min {(logn) - ^-, s(lognloglogn) _1 | , 
with random threshold and (slightly modified) random bandwidth 

and any large enough positive constant C* . This testing procedure satisfies (jHJ) 
for any e G (0, 1) with testing rate 



log n \ 



Moreover, if f G J-(0, 0, f3, cL) for some < c < 1 and if Assumptions (T) 
and (E) hold, then this testing rate is asymptotically adaptive optimal over 
the family of classes {.F(0, 0, (3, L), (3 G [/?;/?]} and for any s G [s; s] (i.e. flHj) 
holds). 

Adaptive optimality (namely <$5§) of this testing procedure directly follows 
from [3] as there is no loss due to adaptation to (3 nor to s. Note also that 
the case of known s and adaptation only with respect to (3 is included in our 
results and is entirely new. 



4 Auxiliary result: Berry-Esseen inequality for degenerate [/-statistics 
of order 2 



This section is dedicated to the statement of a non-uniform Berry-Esseen type 



theorem for degenerate [/-statistics. It draws its inspiration from [13| which 
provides a central limit theorem for degenerate [/-statistics. Given a sample 
Yi, . . . , Y n of i.i.d. random variables, we shall consider [/-statistics of the form 

U n= EE 

l<i<j<n 

where if is a symmetric function. We may assume, without loss of generality, 
that K{H(Yi, Y 2 )} = and thus U n is centered. We shall focus on degenerate 
[/-statistics, namely 

E{H(Y 1 ,Y 2 )\Y 1 } = , almost surely. 

Limit theorems for degenerate [/-statistics when H is fixed (independent of 
the sample size n) are well-known and can be found in any monograph on the 
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subject (see for instance 17|). In that case, the limit distribution is a linear 
combination of independent and centered y 2 (l) (chi-square with one degree of 
freedom) distributions. However, as noticed in [131 ]. a normal distribution may 
result in some cases where H depends on n. In such a context, 13J provides a 
central limit theorem. But this result is not enough for our purpose (namely, 
optimality in Theorem [1]). Indeed, we need to control the convergence to zero 
of the difference between the cumulative distribution function (cdf) of our 
[/-statistic, and the cdf of the Gaussian distribution. Such a result may be 
derived using classical Martingale methods. 

In the rest of this section, n is fixed. Denote by Ti the cx-field generated by 
the random variables {Yj., . . . , Y^}. Define 



i-l 



-^HfaYi), 2<i< 



n 



Jn 3=1 



and note that as the [/-statistic is degenerate, we have E(Zj| Y 1; . . . , Yi_i) = 0. 
Thus, 



S k = J2Zi, 2<k< 



n. 



i=2 



is a centered Martingale (with respect to the filtration {J-fc} fc > 2 ) and S n = 
v~ x U n . We use a non-uniform Berry- Esseen type theorem for Martingales pro- 
vided by 14j . Theorem 3.9. Denote by <fi the cdf of the standard Normal 



distribution and introduce the conditional variance of the increments Z/s, 



1 A 



i-l 



i=2 



J n i=2 



\j=l 



Theorem 3 Fix < 5 < 1 and define 

n 

L n = £ E | Zi | 2+25 + E | Vl - 1 1 1+6 . 



i=2 



There exists a positive constant C (depending only on 5) such that for any 
< e < 1/2 and any real x 



\F(U n <x)- 4>{x/v n )\ < 16e 1/2 exp 



4 V 2 

n , 



c 

rl+<5" 



5 Proofs 



We use C to denote an absolute constant which values may change along the 
lines. 
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Proof of Theorem Q] (Upper bound). Let us give the sketch of proof 
concerning the upper-bound of the test. The statistic T n h i will be abbreviated 
by T n )i . We first need to control the first-type error of the test. 

P (A* = 1) =P (3i e {1, . . . , N + 1} such that |T n)i | > C*t 2 ni ) 

N+l 

< M\Tn,i ~ MTn,i)\ > CX,i ~ E (T nii )). 
1=1 

The proof relies on the two following lemmas. 
Lemma 1 For any large enough C* > 0, we have 

N 

J2M\Tn, - MT n ,i)\ > C*t 2 n>i - E (T n>i )) = o(l). 
1=1 

Lemma 2 For large enough C*, there is some e G (0, 1), such that 

n(\T n ,N+X ~ E (T n>N+1 )\ > CX,N + 1 - M T n,N+l)) < t. 

Lemma [1] relies on the Berry- Esseen type theorem (Theorem [3]) presented in 
Section HI Its proof is postponed to the very end of the present proof. Proof 
of Lemma [2] is easy and omitted. Note for the referee: omitted proofs appear 
in the appendix. 

Thus, the first type error term is as small as we need, as soon as we choose a 
large enough constant C* > in (Q. We now focus on the second- type error 
of the test. We write 

sup sup P/(A* = 0) 

< l r>0 sup sup ¥ f (\T n>N+1 \ < C*t 2 n N+1 ) 

r&[r;r].a>a,f3£{P,(3} f&F{r,L) 

||/-/o||§>e< T 

+ l r=f=0 sup sup P^Vl < i < N, \T nii \ < C*t 2 ni ). 

\\f-M\l>c^l M) 

Note that when the function / in the alternative is supersmooth (r > 0), we 
only need the last test (with index N + l), whereas when it is ordinary smooth 
(r = f = 0), we use the family of tests with indexes i < N. In this second 
case, we use in fact only the test based on parameter f3f defined as the smallest 
point on the grid larger than (3 (see the proof of Lemma [3] below). 
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Lemma 3 We have 

sup sup sup P/(V1 < i < N, \T nti \ < CH 2 ni ) = o(l). 

">^,3e[ft/3] feF(a,0,i3,L) 

ll/-/o|ll>C< (c()0ii3) 

Lemma 4 Fzx r > 0, for any a > a,r G [r; f], /3 G p\. For any e G (0; 1), 
there exists some large enough C° such that for any C > C° and any f G 
J- (a, r, ft, L) such that \\f — /oil 2 — Cil>n,(a,r,/3), we have 

^f{\T n ,N+x\ < C*t 2 n N+l ) < e. 

The proof of Lemma [3] (resp. H]) is postponed (resp. omitted) to the very end 
of the present proof. Thus, the second type error of the test converges to zero. 
This ends the proof of ■ 

We now present the proofs of the lemmas. 

Proof of Lemma [D Let us set p n = (log log n)~ x l 2 and fix 1 < % < N. We 
use the obvious notation p = f * g. As we have 

E (T n ,i) = \\K hi *po - /0II2 = 114^ */o - /bill, 
and < K h (- - Y x ) - J h * f Q , J h *f - f >= 

we easily get 

T n ,-E (T n<l ) = - 2 EE < ^(-n)-^*/ ,^(-^)-^*/o > • 

Let us set 

tf(y,,y fc ) = 2{n(n - l)}- 1 < K hi (- - Y k ) - J hi * f ,K hi (- - Yj) - J hi * f > 

and note that if is a symmetric function with Eo{H(Yi, Y2)} = and 
E {H(Y 1 ,Y 2 )\Yi} = 0. As a consequence, T n;i — E (T„ j) is a degenerate U- 
statistic. Using Theorem [3] (and the notation of Section H]) to control its cdf, 
we get that for any < 6 < 1, for any < e < 1/2 and any 2 

|P (T nil - E (T n>i ) > ar) - (1 - <j){x/v n ))\ 

< 16e 1/2 exp {-^ + -gy |EEo|^| 2+25 + EoK 2 " , 

where v 2 = Var (T nj j) and 

1 i— 1 n 
Zi = -E#(^) and K 2 = E E o(^ 2 |^-i) 
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as in Section HI Choose 5=1 and consider constant (optimization in e 

is not necessary in our context), thus 



|P (T niJ - E (T n)i ) > x) - (1 - <j>{x/v n ))\ 

< Cexp +c\± EoW + E \V* - 1| 2 ) • (18) 



We want to apply this inequality at the point x = C*t^ i — E (T nii ). First, note 
that 

E (T n>i ) = || J hi * f - / ||2 = L I \^{u)\ 2 du < L(h*) 2 ? < Lf n>i , 

ZlT J\u\>l/{h % ) 

leading to 

x > (C* - L)t 2 n i = (C* - L)(np„)- 4ft /( 4ft+4CT+1 ) 
and we choose C* > L. Now, the variance term v 2 satisfies (see 13 



vl = E (T nji - E (T nil )) 2 = n2( ^ w (l + o(l)). 
Using the choice of the bandwidth h l , we obtain a bound of the first term in 

Ce W < Ce W (-^p; 2 ) = C(\ogn)-\ 

where b = (C*) 2 / (C) can be chosen as large as we need. Let us deal with the 
other terms appearing in ( fl8l) . For large enough n, 

| < K hi (- - Y k ) - J h i * f , K h i(- - Yj) -J h i*f >\ 

<- f \§ 9 (u)\~ 2 du< ( 



7T J\v\<l/h* ~ (^) 2ct+1 

and thus, for any p > 2, 

E {\H(Y 1 ,Y 2 )\ 2p } < Cn- 4p (ti)- 2p( > 2a+1 \ 

This leads to 

n i n I i— 1 

i=2 V n i=2 y=l l<7^6fc<i— 1 



1 n 

< 71 E ((* - l)Eo(^(n, ^ 2 ) 4 ) + 3(i - l)(i - 2)E (iJ(F 1 , F 2 ) 2 i/(r 1 , F 3 ) 2 

%i i=2 



< ^n a Eo(ir(n,y 3 ) 4 ) + ^n 3 E (i?(y 1 ,y 2 ) 2 J ff(y 1) y 3 ) 2 ) 



<0(i)-^Lttt- 4 (^) 2(4ct+1) - ° a) 
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Moreover, following the lines of the proof of Theorem 1 in 13| we get 



EoK 2 - II 2 < ~ 4 (MG\Y U Y 2 )) + -EoiH^Y^))) , 
\ n J 

where G(x, y) = E (if(Y"i, x)H(Yi, y)). In [l| this last term was bounded from 
above for this model by Ch % so 



-b 



Returning to ( |T8l) we finally get for x = C*t 2 n i — E (T ni i), 

|P (T nil - E (T n)i ) > x) - {1 - cj>{x/v n )}\ < C ((\ogn)- b + h 1 ) < C(logn) 
Finally we obtain, for b large when C* is large 

v 

£P (|T n!l - E (T n>i )| > CH 2 n ,i ~ MTn,i)) < N(l - <j>(x/v n ) + C(\ogn)- b ) 
i=i 

(log logn) -1 / 2 



< 



CN (v n x- 1 exp{-x 2 /{2v 2 n )) + {logn)- b ) < CNp n {logn)- b < C- 



logn b 1 



Proof of Lemma [3l When f = r = 0, let us fix some constant C > C° 
(C° will be chosen later) and a density / belonging to ^(a, 0, P, L) for some 
unknown a > a and P G [f3] /3] which satisfies ||/ — / ||| > C^faA/S) (choose 
/3 as the largest one). In this proof, we abbreviate t/j n ,(a,o,i3) to ip nt p since in 
this case, the rate only depends on j3. We define /3/ as the smallest point on 
the finite grid {(3 = (5 X < (3 2 < ■ ■ ■ < (3 N = (3} such that (3 < (3 f 

(3 S e {p = A) < Pi < ■ ■ ■ < Pn = P}, fe F(a, 0, p, L), \\f-f \\l> C^, 

P <P f and Vpi < p f , we have p > (19) 

We shall abbreviate to hf, ^ and T n j the bandwidth, the threshold (both 
defined in Theorem [I]) and the statistic (IE]) corresponding to parameter Pf. 
We write 



P/(Vie{l,...,JV}, |r nii | <CHl ti ) 
<F f (\T nJ -E f (T nJ )\ > -C%f + ^f(Tn,f)) 

<F f (\T nJ -E f (T nJ )\ > \\f-f \\l-CXj + B f (T nJ )), (20) 
where 

B f {T nJ ) = Ef(T n j) - ||/ - f \\l = \\J h * /H2 — II/H2 + 2(/ - J h * /, / > 
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is in fact a bias term. It satisfies 



\B f (T nJ )\< ( mu)\ 2 du + 2([ \<$>{u)\ 2 du f |$o(u)| 2 du) 1/2 

J\u\>X/hf J\u\>X/h f J\u\>X/h f 

< Le~^(hf + 2h^) < 3e-^Lhf, 
as / belongs to T{a, 0, ft, L) C JF(a, 0, /3, L). 

Let us study the variance term Kf(T n j — E/(T nj /)) 2 . According to [3], this 

' J nJ 

c_ mj-h 

Ac 



term is upper-bounded by w 2 f given by 



IVllf I I 



and VL g (f — / ) is a constant depending on / and g (but not n) and satisfying 
\tt 2 9 (f - /o)| < C||/ - / ||2 _2<T//3 (see proof of Theorem 6 in 0). 

Using Markov's inequality, this leads to the following upper bound of (1201) 



W n,f 



(\\f-fo\\l-enl f -3e-^Lhfr 



We will proceed differently when ft < a and when ft > a. Let us first consider 
the term concerning ft < a. The point is to use that / satisfies ||/ — / ||| > 
Cip 2 g. Note that we have ftf > ft, constants C > C* and 



^2 j-2 = ^^^4(/3 / -/3)(4a+l)/{(4/3 / +4 ff +l)(4/3+4a+l)} ) 



ensuring that the term Cip 2 a — CH^ f is always positive. Moreover, as > 
ft — ftf > —(ft — ft)/^ogn, we have 



leftjft - ft f ) 

(4/3 / + 4a + l)(4/3 + 4a + l) 



Thus, we choose C° = C* + 3e~ 2 -L/Ci such that for any C > C°, we have 



\\f-fo\\l- C% >f - 3e-**Lhf >(C-C*- Ze-^L/d)^ = ^ 



n,/3> 
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with a > 0. Thus, we get 



sup sup_ sup F f (\/ie{l,...,N}, \T n)i \<Cn 2 nti ) 



II ~ ~ llO^ , o 



f-fo\\ 2 2 >C^l 




Cn 2 nJ - 3e^Lhf) 2 



2-2a/P 



Finally, this leads to the bound 




2+2a//3 



c 



{a/cy 



which converges to zero as n tends to infinity. ■ 
Proof of Theorem [TJ (Lower bound). 

As we already noted after the theorem statement, our test procedure attains 
the minimax rate associated to the class T(a.o, 0, /3, L) where /o belongs, when- 
ever the alternative / belongs to classes of functions smoother than / . There- 
fore, the lower bound we need to prove concerns the optimality of the loss of 
order (loglogn) 1 / 2 due to alternatives less smooth than Jo- 
More precisely, we prove ([6]), where the alternative Hi(C, \l/ n ) is now restricted 
to UggMyjil/ G J-'iO, 0, P, L) and ip^Wf ~ foWl — C} an d i>n,p denotes the rate 
Vv when t = (0,0, ft, L). 

The general approach for proving such a lower bound (0) is to exhibit a finite 
number of regularities {Pk}i<k<K and corresponding probability distributions 
{^k}i<k<K on the alternatives H\(C,ijj n ^ k ) (more exactly, on parametric sub- 
sets of these alternatives) such that the distance between the distributions 
induced by f (the density being tested) and the mean distribution of the 
alternatives is small. 

We use a finite grid B = {/3 X < (3 2 < . . . < Pk} C [§_,(3\ such that 



\/pe[P,(3},3k:\Pk-P\ < 



1 



logn 



To each point (3 in this grid, we associate a bandwidth 



hp = (np n ) 4 ' 3 + 4 -+ 1 ,p n = (log logn) 



1/2 



and Mi 
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We use the same deconvolution kernel as in [3|, constructed as follows. Let G 
be denned as in Lemma 2 in [3j. The function G is an infinitely differentiable 
function, compactly supported on [—1, 0] and such that / G = 0. Then, the 
deconvolution kernel Hp is defined via its Fourier transform $ H & by 

$ H t>(u) = <f> G (hpu)($ 9 (u))- 1 . 

Note that the factor p n in the bandwidth's expression corresponds to the loss 
for adaptation. 

We also consider for each j3, a probability distribution itp (also denoted ix^ 
when (3 = (3k) defined on { — 1, +1} m p which is in fact the product of Rademacher 
distributions on {— 1,+1} and a parametric subset of Hi(C,il) n> p) containing 
the following functions 

^ \ 9j i.i.d. with ¥{9j = ±1) = 1/2, 

feA x ) = fo( x )+2^ fyhp a h p( x - x j,p) > \ 

3=i [ x jt p = jhp e [0, 1]. 

Convolution of these functions with g induces another parametric set of func- 
tions 

poAv) = Mv) + £ OjhfF^Gp (y - x j,p) 

3=1 

where Gp(y) = h^G (y/hp) = Hp*g(y). 

As established in j^] (Lemmas 2 and 4), for any j3, any 9 E { — 1, +l} Ml3 and 
small enough hp (i.e. large enough n) the function fg t p is a probability density 
and belongs to the Sobolev class J-"(0, 0, /3, L) and pg t p is also a probability 
density. Moreover we have 

^Et/>(II/«-/oB>cC)„^i, 



which means that for each /3, the random parametric family {fe,p]e belongs al- 
most surely (with respect to the measure Tip) to the alternative set H^C, tp n ,p)- 
The subset of functions which are not in the alternative Hi(C, \I/ n ) is asymp- 
totically negligible. We then have, 
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7n = inf P (A n = 1) + sup P/(A n = 0) 
An { /ei?i(c,*„) J 

> inf I P (A n = 1) + ^E . __ SU P P /( A « = °) 



fe=i/e J ffi(c,'0„, i 3 A ) 

K 



>inf jp (A n = 1) + 1 ^ (|P /9 ,JA n = 0)7r fc (^) 
>inf |p (A n = 1) + 1^ QfP Aift (A n = O)7T*(d0))} +o(l). 



Let us denote by 

K 



7T 



and P, = -£:£P* = -^£ [Vfe, 0k * k (de) 
^ fe=i ^ fe=i ^ fc=i Je 



Those notations lead to 



7n > inf {P (A n = 1) + P 7r (A„ = 0)} 



> inf <jl - / dF + [ dF w \ > 1 - sup / (dP - dF n ) 

A n I JA„=0 JA n =0 J A 



>l--||P 7r -P || 1 , 

where we used Scheffe's Lemma. 



(21) 



The finite grid B is split into subsets B = UiBi with B\ n Bk — when Z 7^ 
and such that 

V/, VA/ftEB,, 5i5iJ5iIt<| A - A |. 

logn 

The number of subsets B\ is denoted by K\ = O (log logn) and the cardinality 
\B~i\ of each subset Bi is of the order 0(logn/ log logn), uniformly with respect 
to I. 

The lower bound ([6]) is then obtained from (1211) in the following way 



In > 1 



where Fp = f e F f n^dO) . 



1 ^ 

2^1 



T7n£ p /3- p c 



Here we do not want to apply the triangular inequality to the whole set of 
indexes B. Indeed, this would lead to a lower bound equal to 0. Yet, if we do not 



26 



apply some sort of triangular inequality, we cannot deal with the sum because 
of too much dependency. This is why we introduced the subsets &i with the 
property that two points in the same subset Bi are far enough away from 
each other. This technique was already used in [12J for the discrete regression 
model. 

Let us denote by Ep the likelihood ratio 



Je,0 



dF 



irp(d9). 



We thus have 

In > 1 



1 Kl r ( 1 \ 1 Kl 



Li(Pq) 



Now we use the usual inequality between Li and L2-distances to get that 



ln> 1- 



1 Kl 

— -Y 



E 

PeB t 



1/2 



L 2 (Po) 



2^1 U 



Let us focus on the expected value appearing in the lower bound. We have 

2 



£ <5/3 + Y 



\Bi\ 2 



pm 



\Bi\ 2 



P,v&Bi 
P^u 



where there are two quantities to evaluate 

Qp = E ((Ep - l) 2 ) and Q A „ = E {EpE v - 1) . 



The first term Qp is treated as in [3(. It corresponds to the computation of 
a x 2 -distance between the two models induced by Fp and P (see term A 2 in 
(^]). Indeed we have 



Qp < CMpn 2 h 



2> 4/3+4tr+2 



P 



pi 



This upper bound goes to infinity very slowly. The number of /5's in each B\ 
compensates this behaviour 



\Bi 



YQp< 



P&Bi 



\Bi\pl 



Q /(toglogn) a \ 
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The second term is a new one (with respect to non- adaptive case). As G is 
compactly supported and the points (3 and v are far away from each other, we 
can prove that this term is asymptotically negligible. Recall the expression of 
the likelihood ratio for a fixed (3 

Thus, 

J r =l \ j=l P0\ I r) 

x ( i + E ^e^ ^fvf^ l ^ (d^) (de.,,) • 

V i=l l/rj / 

/ q ry _ x . \ \ 

The random variables Y r are i.i.d. and Eo ^ilZ^M = o. Thus we have 



E {ipi v ) 



PO (Y r ) 

M fJ M v 

i + E E e h ^y p +a+1 K +a+l 

3=1 i=l,iCj 

iY 1 - x jt p) G v (Yi - x, hV ) 



En 



Vl (Yi) 



7T/3 (dQ.,p) n u (d9. )V ) . 



where the second sum concerns only some indexes i, denoted by i C j. This no- 
tation stands for the set of indexes % such that \(i — l)hp\ ih^\f\[(j — l)h u ;jh u ] ^ 
0. From now on, we fix (3 > v. Denote by G' (resp. p' ) the first derivative of G 
(resp. po). (The density p is continuously differentiable as it is the convolution 
product fo*g where the noise density g is at least continuously differentiable). 

Lemma 5 For any (3 > v and any € {1, . . . , M v } x {1, . . . , Mp}, we 

have 



Gp (Yi — Xj t p) G u (Y 1 — x iiU ) \ h 

where Rij satisfies 



{ Wxi) ) ~ /? 



\Rij\ < (inf p )- 1 ||G|| oo ||G"|| oo (l + o(l)) 



and o(l) is uniform with respect to 
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The proof of this lemma is omitted. Applying Lemma [51 we get 



1 + E E ^j,/3^i,uhp +a+1 h^ +cr+1 - \ n R. 

j=l i=l,icj 



(M 



2 1 l >3 



Lemma 6 Let U be a real valued random variable such that\/k G N, E (ll 2k+1 ^j 
0. We have, for any integer n > 1, 

LfJ 2k 

E(i + i0"<i + E^ ^ • 

fc=l 

where \_x\ is the largest integer which is smaller than x. 

The proof is obvious and therefore omitted. Apply Lemma [6] to get the in- 
equality 



< E ^(K +a ~ lh » +a+2 ) 2k ^ ( E E ^A^, 



2/,- 



But the #'s are i.i.d. Rademacher variables and the -Rjj's are deterministic, 
thus 



(Mp M v 

^ E E 9j,p0i,vR 

\j=l i=l,iCj 



2k 



l<Ji,-J*<Af/3 l<ii,...,i fc <A/„ Z=l 



Using the bound on the Rij given by Lemma El 



m Mv \ , ,2k 

EJE E < (infpo)~ 1 ||G|| 00 ||G / || 00 (l + o(l)) 



u'=l i=l,icj 



[0,1] 



Indeed, each index may take at most M@ = KZ different values but the 
constraint i\ C ji implies that each index %i is limited to at most hp/h v different 
values. Thus we get 



LfJ „2k 



n 

k={ m 



ch1 +a+1 hr, +a+1 ^) K k 



LfJ 



h 2 



LfJ 



< C E ( n 2 hf +2a+1/2 h 2 u » +2 ° +1 / 2 -^ ] < C £ 



k=l 



k=l 



hi' 2 
o 2 h 5/2 

Pn n 



< c- 



1 W 2 



2 



Pnh 



As f3 > v both belong to some set Bi, we have (3 — v > c(log log n) / (log n) and 
according to the choice of the bandwidths, 



hf/ 2 , 20(13- v) 

-|— . = (npn) (4/3+4 CT +l)(4„+4 CT +l) < exp 



20 c log log n 
(4^ + 4a+l) 2 ' 



l+o(l)) <(logn)- 
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where the constant w (depending on the constant c used in the construction 
of the sets B{) can be tailored to our need. Therefore 

1 ^ C 

Q/3,u < 



which goes to as n goes to +00. We finally obtain the upper bound 



E 



which leads to 



i° im) + bnkw) f- l+ 0(1) - 



Proof of Proposition [0 We fix e > 0. Now, 

P/,.(|s« - s\ > e) < P /)S (4 + s„(s)) + P /)S (|s - s n (s)| > e). 

As \s — s n (s)| < cf n which converges to zero, we get that for large enough n, 
the term P/, s (|s — s n (s)\ > e) is equal to zero. Let us now consider the term 
P/, s (sn 7^ s„(s)) = P/, s (s n > s n (s)) +P/, s (s n < s n (s)). Now, s n (s) is equal to 
some Sfc (using the labeling among the points of the grid). We have 



fc-i 

P/, s (s n < s fc ) = 51 F /,s(£«. = «j 
3=1 
fc-i 

3=1 
fc-l 

p*. ( i<§> p r?^ - $^11 > 



As |$ p (w n )| > g ( g/(M n )$ 9 (M„) for large enough n, we get 



!>/,• (l^(«»)l>5{^ w + ^ +1] }0 
3=1 v z 

fc— 1 1 
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P/,.(*n < S k ) < EP/, S (\K(u n ) - $> n )| > \ + S^} K)(l + 0(1)) 



J'=l 
fc-1 



J'=l 



< ]T exp (A 2 ^' exp(-2<0 + exp(-2<^ 



< AT exp (--exp(-2<) 



Now consider the 

A' 



j=k+l v z 
j=*!+i v 2 

< 7VP /iS f|l»^K) - $> n )| > qp(u n ){&(u n ) - U W K)} + o(g^K)^K)) 



as |$ p (-u„)| > qpr(u n )^ 9 (u n ) for large enough n and j '• — 1 > k . According to 
the choice of the grid, we have | s — | < d n and d n logw n — > 0, which implies 



$ 9 K) - \® k {u n ) =exp(-<) (l - iexp[< fc (« 



S-Sfc 

n 



1)] 



: exp(-<) ( 1 - 2 ex P[< fc ( s - s k ) logw„(l + o(l))] 



>exp(-<) ^l--exp«^(l + o(l))) 
>iexp(-<)(l+o(l)), 



where the first inequality comes from d n logw n < \ s . This gives 



P/, s (*n > s fc ) <iVP/, s (J$£K) - $ P K)| > -g^K)^K)(l +o(l)) 
<iVexp L^! nM - 2/3 'exp(-2<)(l + o(l))") . 



In conclusion, as soon as we have d n log-u„ < u n s , and logiV = o((logn) a ) 
(which is ensured by our choice of d n ) we get, for any e > and large enough 

n, 

P/, s (|s n - s\ > e) < iVexp {-^nu^ exp(-2<)(l + o(l))^ 

<exp L^(logn) Q (l + o(l))V 
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The last term gives a convergent series and then according to Borel Cantelli's 
lemma, Pj )S (|s n — s\ > e i.o ) = leading to the almost sure convergence of 
s n . m 



Proof of Corollary [T], Note that the new choice of d n still satisfies the 
requirements for Proposition [T] to be valid. We introduce respectively, h n , the 
non-random version of the bandwidth h n and K n the non-random version of 
the kernel K n both constructed with self-similarity index s n (s). The Fourier 
transform of K n thus satisfies 



$ x "(n)=exp((|n|/^) s "W)l| M |< 1 
where h n = (2~ l \ogn -((3- s n (s) + 1/2) loglogri/s n (s))~ 1/s " (s) . 

We also introduce the corresponding (classical) estimator 

n 

f n (x) = (nhJ-^KniK^-Yi)). 
i=i 

Note that obviously, s n (s),K n and h n are unknown to the statistician. These 
objects are used only as tools to assess the convergence of the procedure. Now, 
remark that we have 

%,.[|/»(*)-/(s)| 2 ]=E /> .[|^ 

= Ti+T 2 , 

say. Let us focus on the first term 

T\ < E fta [\f n {x) - f(x)\ 2 ] = {E f>s [f n (x)} - f(x)} 2 + Var s {/ n (x)}, 

introducing the bias and the variance of the estimator f n {x). The important 
thing to note is that the kernel estimator f n uses parameter s n (s) which is not 
equal to the true one s. Thus 7\ is not the classical risk for kernel estimator 
with known index s. Using Parseval's equality 



{E f Mn(x)}-f( X )Y 



1 

47T 



1 

< 

47T 



e- iux $(u) (l| tt |< 1/hn exp(-| M | s + \u\ s ^) - l) du 

1 2 



u\<l/h„ 



$(m)| (exp(— \u\ 



\u 



1 ) du 



u\>l/h n 



\$(u)\du 



The second term in the right hand side is the classical bias and equals 0(/i^ -1 / 2 ) 
As soon as d n h~ s log(l//t n ) converges to zero, we can use the following devel- 
opment in the first term, uniformly for \u\< l/h n , 
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exp(-\u\ s + \u\ Sn(s) ) - l=exp{\u\ s (s n (s) - s) log \u\(l + o(l))} - 1 

= |u| 8 (s„(» - s) logju|(l + o(l)), 

which leads to 



i 2 



{E f>s [fn(x)\ - my 

mu)\\u\ s (s n (s) - s) log \u\du) (1 + o(l)) + 0{h^ 2 ) 



1 

< — 

4lT [\J\u\<l/h n 

< 0(dl)l p>s+1/2 + Oi&Kg-*- 1 log 2 (l/^))l^< s+1/2 + - 1 ). 



It can be easily seen that 



W(l/ftn) 
0(l)(logn) 2s / s "( s )(loglog 



log n(loglogn) 2 
<0(l)(logn) 2dn/ * < 0(l)exp{2s/(slogn)} = 0(1), 

leading to 

{£,,.[/„(*)] - f(x)} 2 < 0{dl)l p>s+1/2 + O^f- 1 ). 

Moreover, when f3 > s + 1/2, we use d 2 n < (logn)"^- 1 ^ = 0(h^~^). With 
this choice of d n , we thus ensure that in any case 

{E f , s [fn(x)]-f(x)} 2 <0(hf- 1 ). 

The variance of f n (x) is bounded by 
1 



,2 



Var /)3 {/ n (x)} = — E /)S 



I e - iux e HSn(a \e iuY -$ p (u))du 

J\u\<l/h n 

2 



< J_ ( [ e \^du) = O ( hl Ms) - 1) exp(2/h^ s) Y 



ir 2 n \J\u\<i/h n I \ n 



We finally get the bound 

'^nW-lJgjjp^/^W)' 



n 



Now, we prove that the second term T 2 is negligible in front of the main term 
Ti, by using Proposition CD and uniform bounds on \f n {x)\ and First, 
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|/ B (x)| < / eWl mi/hn dt = OiK-'eMVK}) 

< O ( 1) (log n) {1 ~ s) ^ exp { (log n) 
|/(x)| < / m)\dt = 0(J(1 + Itn-'dt) = 0(1), 



and then 



T 2 = 0((logn) 2 ( 1 - s ")^exp{2(logn)^})P / , s (s n ^ s n (s)) 

= ^(logn) 2(1 - 5) ^exp ^2(logn) s_/2 - ^(logn) Q (l + o(l))^ . 

As soon as we choose a > s/s, this second term T 2 will be negligible in front 
of Ti. In conclusion, 



w , exp(2//frM) ' 



E/,.[|/n(x) -/(x)| 2 ]=0(/ir i ) +0 [h 2 n ^ s) 

= 0((\ogn)-W- 1)/s " {s) ) = 0((\ognyW~ 1)/s ) 
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A Technical Proofs 



Proof of Lemma [21 Using a Markov inequality and the usual controls on 
bias and variance, we get 



^o(\T n ,N+i~ ^o(T n ^ + i) | >C t n ^ N+1 — 



i , 



and by choosing C* large enough, this term is smaller than some e > 0. ■ 
Proof of Lemma 31 Let us write 

P/(|^n,jv+i| <C*t 2 nN+1 ) < ¥f(\T njN+1 — MfT n>N+1 \ > II/— J^olli — C**n,AT+l — Bf{T n ,N+l)) 
where 



< 



\Bf(T n ,N+i)\ = \^f(T n ,N+i) - 11/ - /o|| 2 l 

/ \ 1/2 

/ \<S>(u)\ 2 du + 2 ([ \$(u)\ 2 du [ \%(u)\ 2 du) 

J\u\>l/h N + 1 \J\u\>l/h N + 1 J\u\>l/h N + 1 J 

[L(h N+1 ) 2 ^exp{-2a/(h N+1 y} + 2L(h N+ y + ^exp{-a/(h N+1 ) r })(l 
<2L(h N+1 )P + P exp{-a/(h N+1 ) r }(l +o(l)). 

In the same way as in the proof of Lemma [31 we have 

E f (T n , N+1 - E f (T n , N+1 )) 2 < n2{hN C +ir+1 + ^^"^ W = </, 

and fl g (f — /o) is a constant depending on / and g (but not n) and satisfying 

\^g{f ~ fo)\ — C\\f ~ /oil 2 2a ■ The rest of the proof follows the same lines 
as Lemma [3j Indeed, Markov's Inequality leads the following bound on the 
second type error term 

<f 

(11/ - fo\\l - C**5U+i " 2L(h^)^exp{-a/(h^Y}(l + o(l))) 2 

' Cn- 2 ^ 1 )-^- 1 C 



< max 



(c° - c*)x, 'n||/ - /o||^ +2<T//! (co - c*y 



The first term in the right hand side is a constant which can be as small as 
we need, by choosing a large enough constant C°. The second term converges 
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to zero. ■ 



Proof of Lemma [51 As (3 > v, the bandwidths satisfy h v hp X = o(l). Then, 
as G is compactly supported on [—1,0], we have 



pU y i) J J* Po(v) 



Gp {h v u + x i)V - 3Cjj3) G (u) ^ 
-i,o] p (h v u + x ijV ) 



Apply the Taylor Formula to get 



Gp (h v u + x i>u - x jjP ) = Gp (x i>u - x jt p) + j^uG' ^ huUl + x ^ u — 5z£ 
, 1 _ 1 Po {Ku 2 + x i>v ) 

Pq [h v U + X ijV ) Po{Xi, u ) po (h u U 2 + X i)V ) 

where < u\ < u and < u 2 < u. As / G = 0, we obtain 

Gp {h v u + x i:V - Xj/) G (u) ^ 

•i,q] Po {Ku + x i>v ) 

1 K f , [ 'h u ii 1 + x i>v - xjA 

' uG — 5 — G(u)du 



Po hp J [-1,0] \ hp 

-h v Gp [x iyU - x jt p) / — — : -z-uG (u) du 

J l-ho] Po [h v u 2 + x ijV ) 

- § f Mm±2^ g> ( huUi + x^-xjA Q {u) ^ 
hp J\-i,o] po (h u u 2 + Xi 

,v) \ P / 

This leads to 

( G^-x^GAYi-x^ _ K „ 



where 



-,/ / hyU\ ~\- X{^ ^j,f3 



Ri j = ; . / uG' 1/1 T''" J ' p 
Po (av) ■ , [- 1 >°] V fya / 

\ hp J J[-i,o] Po (h y u 2 + a*,,,) 

— —u G t- 2 — G (u) du. 

■1.0] Po (Ku 2 + x i v ) \ hp J 
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satisfies 

\Riil < (inf ^o)^ 1 !^!!^!!^'!!^ + ||<^||oc,||^o llooC^nf ] ^ )^ 2 (^/ 3 ||G'||oo + ^z.||G"| 
which ends the proof of Lemma [5j ■ 



Proof of Theorem [21 

Assume now that f G J r (a,f,p ,L), for some /3 G \P,0\- The proof follows 
the same lines as the proof of Theorem [TJ 

For the first-type error we write 



Ni 

P (A; = 1) = £P (|T„, - E (T nji )| > CX,i - MT n ,i)) 

i=l 

N 2 

+ Yl F (\ T n,i - M T n,i)\ > CX,i - E (T nii )). 

i-JVi=l 

For the first Ni terms we apply Lemma[T]with Eo(T n> j) = o(l)L(/ij) 2 ^ exp(— 2a/hl) 
which is smaller than t\ i for all % = 1, . . . , N% and the same result follows. For 
the last N2 terms we also use the Berry-Esseen inequality as in the proof of 
Lemma [I] for 

x = C*t 2 nti - E (T n ,) > CH 2 n>i (l - o(l)) 
asE (T nii ) = o(l)hf exp(-2a/h$) = o(l/n). We get x/v n = 0(1) (log log log n) 1 / 2 



N 2 

£ M\T n ,i - MT n>i )\ > CH 2 ni - E (T n>i )) 

i-N\=l 

^ N v n ( ^ n (logloglogn)- 1 / 2 

< JT - exp — - < 61 rf— ; = Oil , 

" 2 C*t 2 , P V 4t, 2 J ~ (loglogn)^ 1 1 h 

for some b > 1 for C* large enough. Indeed, all other calculations are similar as 
they are related mostly to the distribution of the noise which didn't change. 

As for the second-type error, 



sup sup P/(A* = 0) 

< l I=f= o sup sup P/(V1 < % < N h \T nti \< C*t' 2 n 

\\f-h\\i>wi, W) 

+ lr>o sup sup P/(VJVi + 1 < i < Ni + N 2 , \T n>i \< C*t 2 n>i 

rG[r;f],ae[a,a],/3G[/3,/3] /G^"(r,L) 

ll/-/olll>^, T 



38 



For the first term in the previous sum we actually apply precisely Lemma [3j 
For the second term we mimic the proof of Lemma [3] and choose some / in 
J- {a, r, j3, L) such that ||/— / 111 > r> where we denote ipn,r = i>n,T^r>Q- We 
define r/ as the smallest point on the grid {r 1; . . . ,r N2 } such that r <Tf. We 
denote by hf, t^f and T n j the bandwidth, the threshold and the test statistic 
associated to parameters a and r/ (they do not depend on f3). Then 



P/(ViVi + 1 < i < Nt + N 2 , |T nji | < C*t 2 n ,i) 
<¥ f (\T nif -M f (T n , f )\ > \\f-f \\l-C*t 2 n j-B f (T nJ )), (A.l) 

where, as in Theorem [1] 



\B f (T n , f )\ = \\\J h *f- f\\t + 2(f -J h *f, f )\ 

< (Lhf exp(-2a/h r f ) + 2Lhf^ exp(-a/h r f - a/h})) (1 + o(l)) 

< L(hf + hP f +Po ) exp(-2a/h r f )(l + o(l)) 



< L(h^ A(3 °) exp(-2a/^)(l + o(l)). 



Using Markov's inequality, we get the following upper bound for (1A.1I) 

Va r/ (T nJ ) 

(11/ -/oil! -c^-B/Cr^))*- 

The variance is bounded from above by 



(A.2) 



4a 
\f 



E f (T nJ - Ef(T n j)) < + gw n (A.3) 



and similarly to i we show that - / ) < ||/ - / |||(log ||/ - /oll^ 2 ) 2 ^- 
We have 

^^(logn)^^- 1 ^/ 2 ^!, 
and thus ||/ - / ||| - CH 2 nJ > (C - C*)^ r . Moreover, 



Bf(T nJ )^- 2 r < C(logloglogn)- 1 / 2 (logn) 



-(/3+/3A/3 )/r / -(4 CT +l)/(2r) 



, / log n \ r//r/ 
x exp \ —2a I — - — I + log n 



The construction of the grid ensures that — 1/ (log logn) < r — rj < and thus 
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exp < —2a 



logn 
2c 



r/rr 



logn 



= exp 
< exp 



logn 

c 

logn 



n ox]) | loglogn(l + o(l)) ] — c 



a exp ( — (1 + o(l)) ) — c 



0(1), 



as we chose the constant c < aexp(— 1/r). Finally, we have Bf(T n j)i])~ 2 r = 
o(l). Let us come back to 0A.20 . We distinguish two cases whether the first 
or the second term in (1A.3j) is dominant. If the first term in the variance 
dominates, we have the following bound for flA.2j) 



n h 



2 , -(4(7+1) 



/ 



< 



C 



(C — C*) 2 ip^ T log log logn 



On the other hand, if the second term in (1A.3I) is the larger one, the bound 
(IA.2j) writes 



^ 1 ||/-/olli(log||/-/o|| 2 - 2 ) 2CT/r 
||/-/o||S(l-C*/C + o(l)) 2 



This finishes the proof. 



< Cn-Vn^logO^ 

Cilogn)- 1 /^ (log log logn)" 1 / 2 = o(L 



Proof of Corollary [2J We keep on with the same notation as in Subsec- 
tion [221 and denote by I the functional / f 2 . In the same way as in the proof 
of Corollary [U we write 



E /ia [|T n - I\ 2 ] < E fjS [\T n - I\ 2 ] + E f>s [\f n - J| 2 1 {W .„ W} ]. 



(A.4) 



Let us first focus on the first term appearing in the right hand side of (1A.4j) . 
We split it into the square of a bias term plus a variance term. The bias is 
bounded by 



E /iS T n -I\< 



1 



\<$>(u)\ 2 du+ f |exp(2H s " (s) -2|m| s ) - 1||$( 

n J\u\<l/h n 



27T \J\u\>X/h„ 

<o(h-^) 

<0(h-W) + 0(d»)l {a < 2/3} + 0{hf- s \og{l/h n )d n )l {s>w] . 



XWMuWdu 



+ 2\u\ s \s n (s) - s\ log \u\\$(u)\ 2 du 

J\u\<l/h n 



Like in the proof of Corollary [U we have d n h n s log(l//i n ) = o(l) and thus 
using that d n < (logn)~ 2/3 /- = 0((logn)~ 2/3//s ), we finally get 



\®f,.T n -I\ <0{{\ogn)- 2 ^ s ) 
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Concerning the variance term, we easily get 

Var /iS (T n ) < ^/C^exp^/ZC^) + — hf +Sn ^~ 1 exp(2//# 



n 2 " n 



where G\ and C2 are positive constants (we refer to |3j, Theorem 4 for more 
details). Using the form of the bandwidth h n , we have 



-4/3/s 



Let us now focus on the second term appearing in the right hand side of ( 1A.4j) . 
Denoting by h = (logn/2) _1// -, we have 

\f n \ < I ex P (2\u\ s )du = 0{hl~ l exp(2/^)). 

Z7T JIuKl/ftn 



Moreover, 



2 _ _L||*||2 

This leads to 

E/,,[|r»--rri{ Sn ^„ w} ] <G^y ""«ip W^j '*} P/,.(i„ / »„(»)) 

£ C (^J'^exp {2 (!fi)" 2 } exp (-£<log„)-<l + «!))) , 

and this term is negligible in front of the first term appearing in the right 
hand side of (IA.4[) as soon as a > s/s. This leads to the result. ■ 



Proof of Corollary [31 We use the same notation as in Subsection 13.21 
Moreover, T° is the test statistic constructed with the deterministic kernel K n 
and the deterministic bandwidth h n ; and t 2 n is the threshold defined with the 
parameter value s n (s) for the self-similarity index. The first type error of the 
test is controlled by 

P /o , s (A* = 1) = P /0 , s (|T n °|^ 2 > (?) < P A> „(S„ ^ sM) + P/ ,,(|T n °|^ 2 > C). 

The first term on the right hand side of this inequality converges to zero 
according to Proposition [TJ Let us focus on the second term. We have 

P /0 , s (|T n °|C > C) < ^i^E /o , s (T n ) 2 < {(E/ ,sT n ) 2 + Var /0iS T„ } . 

It is easily seen that 
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T° n = ^l \%{u)\ 2 \eM\u\ s " {s) -\u\ s )-l\ 2 du+^ f \%(u)\ 2 du 

Z7T J\u\<l/h n 2,71 J\u\>l/h n 



dl 



f |$o(n)| 2 M 2s log 2 |«|d«) (1 + o(l)) + 0(hf ) 

<O(dl)l 0>s + O(hf) = O(hf). 



the inequalities being valid as soon as h n s d n \og{l/h n ) converges to zero. Like 
in the proof of Theorem 4 in 0], we can show that 

us n (s)-l h2j3+s„(s)-l 

Var /o , s (T n °) < (l)^^exp(4/K^) + 0(l)-2 exp(2//C (s) )- 

n z n 

Finally, we get 



1 f _ us n (s)-l L2(3+s n {s)-l 

< \0(ht? ) + 0(l)^-^exp(4/^)) + 0(1)^—^ exp(2/^W) 

< m_ n 



Choosing C* large enough achieves the control of the first error term. We now 
turn to the second error term. Under hypothesis Hi(C, ^ n ), there exists some 
(3 such that / belongs to J-"(0, 0, (3, L) and ||/ — /o||| — ^Vw?- We WT ^ e 

P /jS (A* = 0) = P /lS (|T n °|t- 2 < (?) < F f , s (s n + s n (s)) + P /iS (|T n |t; 2 < C*). 

As already seen, the first term in the right hand side of this inequality con- 
verges to zero, so we only deal with the second one. We define Bf s (T®) = 
E /)S r°-||/-/ || 2 2 .Thus 

Vf, s (\T°K 2 < n < P /iS (|T n ° - E /iS T n °| > ||/ - / || 2 - CX + %, S (T°)) 

~ (\\f-M\ 2 2 -cx + B f , s (W { ' ' 

We compute this bias term 5j s (T°). 



= ^ / 1 «p(i«r w - inn$Hi| U |< 1/hn - $ ( M )i 2 rf M - i-y i$h - $ « 

<i-/ |[ex P (| M |^)-| M | s )-l]$H| 2 rf M +i- / |$(u)| 2 du 

Z7T J\u\<l/h n 111 J\u\>l/h n 

<^(l + o(l)) / \u\ 2s log 2 \u\m U )\ 2 du + 0(hf) 

Z7T J\u\<l/h n 

<0(dl)l p>s + 0(hf) = 0(hf). 
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In fact, there exists some constant C\ > depending only on L and on the 
noise distribution such that Bf s (T®) < Citi^f . Under hypothesis Hi(C,^ n ), 
we also have ||/ — / || \ > Ci/) 2 g. Thus, 



11/ - /oil! - C*£ + B f ,{7%) >C ^ - C* ^ -Ci (=§^j 



>a 



2 

'i \ -2/3/s 

logn \ 
~1T 



where a = C — C* — C\ is positive whenever C > C .— C* — C%. Returning to 



(|A.5I) . we get 



P/,WlO < %Var /iS (T n ). 



a 2 

Computation of the variance follows the same lines as under hypothesis H Q . 
We obtain 



Var /iS (T n °) < (i)*^exp(2//tfM) (hf + l ^ (2// '<^ ' 



The choice of the bandwidth ensures that the second type error term converges 
to zero. ■ 



Proof of Theorem [31 This proof follows the lines of Theorem 3.9 in 14]. 
Combining the Skorokhod representation Theorem and Lemma 3.3 in 141 ]. 
there exists a nonnegative random variable T n such that for any < e < 1/2 
and any real x, 

\F(U n < x)-cj){x)\ = \V(S n < v- 1 x)-<j ) (x/v n )\ < 16e 1 / 2 exp{-x 2 /(4^)}+P(|T n -l| > e). 
Moreover, for any S > 0, 



\T n -l\ >e) <4e 



-1-5 



E 



T - V 2 \ 1+s + W 2 - 1 



1+5 



where T n — V 2 is a sum of Martingale differences. In the same way as in [14 
we obtain (as 5 < 1) 



\T n -l\>e)<C. 



.-1-8 



|2+25 



+ E|K - 1 



1+5 



i=l 



which concludes the proof. 
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